New AI Tool Searches Millions of Historical Newspaper Pages

2020-10-01

00:00 / 00:00
复读宝 RABC v8.0beta 复读机按钮使用说明
播放/暂停
停止
播放时:倒退3秒/复读时:回退AB段
播放时:快进3秒/复读时:前进AB段
拖动:改变速度/点击:恢复正常速度1.0
拖动改变复读暂停时间
点击:复读最近5秒/拖动:改变复读次数
设置A点
设置B点
取消复读并清除AB点
播放一行
停止播放
后退一行
前进一行
复读一行
复读多行
变速复读一行
变速复读多行
LRC
TXT
大字
小字
滚动
全页
1
  • A new search tool uses machine learning to search millions of U.S. newspaper pages for historical pictures.
  • 2
  • The U.S. Library of Congress recently launched the tool, called Newspaper Navigator.
  • 3
  • The online search system is available for free to the public.
  • 4
  • The Library of Congress is the world's largest library.
  • 5
  • It offers materials from the creative record of the United States.
  • 6
  • The library serves as the main research service for the U.S. Congress.
  • 7
  • Newspaper Navigator currently permits users to search more than 16 million pages from newspapers across the country, from 1900 to 1963.
  • 8
  • The newspaper pages were digitized for another Library of Congress project, called Chronicling America.
  • 9
  • This tool also permits searches across the library's 16 million newspaper pages.
  • 10
  • The pages contain more than 1.5 million images.
  • 11
  • The Chronicling America system permits users to find and look at full newspaper pages as digitized images.
  • 12
  • Users can also search the collection by keyword, using optical character recognition -- OCR.
  • 13
  • OCR is a tool that uses digital cameras to identify printed characters on a page for searches or to produce text.
  • 14
  • This meant that people using the Chronicling America site had to search through newspaper pages themselves when trying to find specific images.
  • 15
  • The new Newspaper Navigator tool offers the ability to carry out searches based on image-only content in the collection.
  • 16
  • This is where the machine-learning methods come in.
  • 17
  • The search system was trained to recognize different kinds of images.
  • 18
  • For example, it was designed to tell the difference between photos, maps, comics, advertisements, etc.
  • 19
  • It can also identify similar images and return these in search results.
  • 20
  • Benjamin Lee created the system. He is a member of the Library of Congress' Innovator in Residence Program.
  • 21
  • The program was established to sponsor people from different fields to create new ways to present the library's huge historical collections to the public.
  • 22
  • Lee trained a machine-learning model to identify the visual content and then ran the model over all 16 million pages in Chronicling America.
  • 23
  • His training model was based on another Library of Congress experiment called Beyond Words.
  • 24
  • That project invited members of the public to help identify cartoons, drawings, pictures and advertisements in newspapers during World War I.
  • 25
  • Lee said that after he learned of the Beyond Words experiment, he saw a great possibility to use that information to power his machine-learning tool.
  • 26
  • "I began to wonder whether this identified visual content was the key to throwing open the treasure chest of visual content, throughout all 16 million pages in Chronicling America."
  • 27
  • Newspaper Navigator works like other search engines. Users enter a search term in the "keyword" box.
  • 28
  • They can also choose to limit search results by location, as well as by date.
  • 29
  • But one of the most powerful tools in the system is the ability to search images by visual similarity.
  • 30
  • Users of the tool can save images to a personal "collection."
  • 31
  • They can then use those images as a basis for finding other visually similar images across the library's full collection.
  • 32
  • The system even permits users to "retrain" the machine learning tool for individual searches.
  • 33
  • This is done by examining the images that the search returns.
  • 34
  • By selecting whether images found were similar or not similar to the desired result, the user is "retraining" the system to improve its search performance.
  • 35
  • A demonstration of the Newspaper Navigator is available to help users learn more about the tool and how to carry out different searches.
  • 36
  • The creators hope the tool can be useful for historians, reporters, educators, professional researchers or anyone interested in learning about U.S. history through newspapers.
  • 37
  • The Library of Congress notes that all images included in Newspaper Navigator and Chronicling America are in the public domain, meaning people are free to use them as they wish.
  • 38
  • I'm Bryan Lynn.
  • 1
  • A new search tool uses machine learning to search millions of U.S. newspaper pages for historical pictures.
  • 2
  • The U.S. Library of Congress recently launched the tool, called Newspaper Navigator. The online search system is available for free to the public.
  • 3
  • The Library of Congress is the world's largest library. It offers materials from the creative record of the United States. The library serves as the main research service for the U.S. Congress.
  • 4
  • Newspaper Navigator currently permits users to search more than 16 million pages from newspapers across the country, from 1900 to 1963.
  • 5
  • The newspaper pages were digitized for another Library of Congress project, called Chronicling America. This tool also permits searches across the library's 16 million newspaper pages. The pages contain more than 1.5 million images.
  • 6
  • The Chronicling America system permits users to find and look at full newspaper pages as digitized images. Users can also search the collection by keyword, using optical character recognition -- OCR. OCR is a tool that uses digital cameras to identify printed characters on a page for searches or to produce text.
  • 7
  • This meant that people using the Chronicling America site had to search through newspaper pages themselves when trying to find specific images. The new Newspaper Navigator tool offers the ability to carry out searches based on image-only content in the collection.
  • 8
  • This is where the machine-learning methods come in. The search system was trained to recognize different kinds of images. For example, it was designed to tell the difference between photos, maps, comics, advertisements, etc. It can also identify similar images and return these in search results.
  • 9
  • Benjamin Lee created the system. He is a member of the Library of Congress' Innovator in Residence Program. The program was established to sponsor people from different fields to create new ways to present the library's huge historical collections to the public.
  • 10
  • Lee trained a machine-learning model to identify the visual content and then ran the model over all 16 million pages in Chronicling America.
  • 11
  • His training model was based on another Library of Congress experiment called Beyond Words. That project invited members of the public to help identify cartoons, drawings, pictures and advertisements in newspapers during World War I.
  • 12
  • Lee said that after he learned of the Beyond Words experiment, he saw a great possibility to use that information to power his machine-learning tool. "I began to wonder whether this identified visual content was the key to throwing open the treasure chest of visual content, throughout all 16 million pages in Chronicling America."
  • 13
  • Newspaper Navigator works like other search engines. Users enter a search term in the "keyword" box. They can also choose to limit search results by location, as well as by date.
  • 14
  • But one of the most powerful tools in the system is the ability to search images by visual similarity. Users of the tool can save images to a personal "collection." They can then use those images as a basis for finding other visually similar images across the library's full collection.
  • 15
  • The system even permits users to "retrain" the machine learning tool for individual searches. This is done by examining the images that the search returns. By selecting whether images found were similar or not similar to the desired result, the user is "retraining" the system to improve its search performance.
  • 16
  • A demonstration of the Newspaper Navigator is available to help users learn more about the tool and how to carry out different searches. The creators hope the tool can be useful for historians, reporters, educators, professional researchers or anyone interested in learning about U.S. history through newspapers.
  • 17
  • The Library of Congress notes that all images included in Newspaper Navigator and Chronicling America are in the public domain, meaning people are free to use them as they wish.
  • 18
  • I'm Bryan Lynn.
  • 19
  • Bryan Lynn wrote this story for VOA Learning English, based on reports from the Library of Congress. Ashley Thompson was the editor.
  • 20
  • We want to hear from you. Write to us in the Comments section, and visit our Facebook page.
  • 21
  • ________________________________________________________________
  • 22
  • Words in This Story
  • 23
  • page - n. one part of a website
  • 24
  • digitize - v. to put information into the form or a series of numbers, usually so that it can be understood by a computer
  • 25
  • character - n. a letter, number or other mark or sign used in writing or printing
  • 26
  • comics - n.​ a series of pictures that tell a story​
  • 27
  • content - n. information contained in a piece of writing, a speech, a movie or on the internet
  • 28
  • visual - adj. related to seeing
  • 29
  • sponsor - v. to pay for someone to do something or for something to happen
  • 30
  • location - n. place where something takes place